Every morning, Midge, a healthcare worker, embarks on her daily journey from the quiet streets of Pottstown1 in Montgomery County to the bustling corridors of the Hospital of the University of Pennsylvania in Philadelphia. Her commute is more than just a physical journey—it’s a daily ordeal, wrestling with the infrequent and unreliable bus service that connects the suburban sprawl to the urban core. On a good day, Midge waits 30 minutes for Bus 93; on a bad day, the wait can exceed an hour. She must then transfer to the Norristown High Speed Line, extending her total morning commute to over 2.5 hours—five times longer than Philadelphia’s average commute time.2 This inconsistency isn’t just a personal inconvenience; it’s a stark reflection of the broader issues of transit inequities that plague many in the suburbs, where buses are few and far between despite the pressing need.
Midge’s story is not unique in Greater Philadelphia—a region where 58.2 percent of public transit users are women and 66.4 percent are people of color. These individuals predominantly populate essential sectors like healthcare, education, and social assistance which employs 25.6 percent of suburban county workers who depend on public transportation. Despite playing critical roles in their communities, these workers earn median annual earnings approximately $7,673 below the regional average. This economic disparity is exacerbated by the transit challenges they face, where the gap in accessibility and frequency of service widens the divide between different socioeconomic groups. Additionally, a significant portion of the workforce, about 35%, lives below the poverty line, heavily relying on this patchwork public transport system that struggles to meet their daily needs.3
Midge’s daily commute sheds light on a more significant issue: the critical role of public transportation in providing access to essential services and economic opportunities, especially in suburban communities. Studies highlight a robust negative correlation between unemployment rates and proximity to transit access. In New York City, neighborhoods with some but limited transit access experienced a significantly higher unemployment rate (12.6%) compared to areas with robust transit options (8.1%). 4 This lack of transit access limits physical and economic mobility, contributing to income inequality and higher public assistance dependency.
Moreover, reliable public transportation is pivotal for healthcare access. The National Health Interview Survey (2002) estimated that 3.6 million people miss non-emergency medical appointments annually due to transportation issues. This issue disproportionately affects communities of color, who rely more heavily on public transportation compared to white populations. 5 Midge’s commute is not just a matter of convenience; it’s about accessing crucial healthcare services that manage chronic conditions and provide necessary treatments.
Transportation also plays a fundamental role in educational access. The “geography of opportunity” significantly shapes educational outcomes through accessible transportation, enabling children and young people to reach schools and participate in extracurricular activities.6 A study in Minneapolis found that high school students with access to an unlimited bus and rail pass attended school more regularly and achieved higher GPAs by 0.23 than students without a pass, 7 highlighting the importance of reliable transit in educational attainment.
Using the latest SEPTA General Transit Feed Specification (GTFS) data8, it reveals that a staggering 80% of SEPTA’s trips are made by bus, showcasing the critical role of bus transit in Philadelphia’s transportation network. This statistic underlines the heavy reliance on buses across the city, not just in suburban areas.
# Import GTFS data
stops <- read.csv("data/google_bus/stops.txt", header = TRUE, sep = ",")
stop_times <- read.csv("data/google_bus/stop_times.txt", header = TRUE, sep = ",")
trips <- read.csv("data/google_bus/trips.txt", header = TRUE, sep = ",")
shapes <- read.csv("data/google_bus/shapes.txt", header = TRUE, sep = ",")
routes <- read.csv("data/google_bus/routes.txt", header = TRUE, sep = ",")
calendar <- read.csv("data/google_bus/calendar.txt", header = TRUE, sep = ",")
calendar_dates <- read.csv("data/google_bus/calendar_dates.txt", header = TRUE, sep = ",")
# Import Swiftly data
swiftly <- st_read("data/SwiftlyShapes/geoJSONs/SEPTASurface_HighResSpeed_v2.json")
# Join stop_times and trips, from now on dat will be our main data set
dat <- left_join(stop_times, trips, by = "trip_id")
# join dat and routes
dat <- left_join(dat, routes, by = "route_id")
#determine route_type
dat <- dat %>%
mutate(mode = if_else(route_type == 0, "Trolley", ifelse(route_type == 1, "Metro", ifelse(route_type == 3, "Bus", "Trolleybus"))))
# Mode share (not ridership)
dat_trip_by_mode <- dat %>%
group_by(mode) %>%
summarise(mode_count = n(), .groups = 'drop') %>%
mutate(total_count = sum(mode_count),
mode_share = round(mode_count/sum(mode_count)*100, digits = 2)) %>%
ungroup()
# # Plot mode share using ggplot2 and the custom palette
ggplot(dat_trip_by_mode, aes(x = mode, y = mode_share, fill = mode)) +
geom_bar(stat = "identity", width = 0.75) +
scale_fill_manual(values = bus_revolution_palette(length(unique(dat_trip_by_mode$mode)))) +
geom_text(aes(label = paste0(mode_share, "% (", mode_count, " trips)")),
vjust = -0.5, color = "black", size = 2.5) +
labs(title = "Mode Share by Transit Type",
x = "Mode",
y = "Mode Share (%)",
fill = "Mode",
caption = "Source: SEPTA GTFS Data 2024") +
theme_minimal() +
theme(legend.position = "right",
plot.title = element_text(hjust = 0.5))
#
# # total trip bus & trolley bus 1.980.505, total observations 2.026.677
#
# ggplot() +
# # Add service period background rectangles with reduced alpha for subtlety
# geom_rect(data = service_periods, aes(xmin = start_hour, xmax = end_hour, ymin = 0, ymax = max(hourly_trips$number_of_trips), fill = peak_category), alpha = 0.2) +
# # Add the line plot for weekday trips with adjusted point sizes
# geom_line(data = hourly_trips %>% filter(final_day_type == "Weekday"), aes(x = hour, y = number_of_trips), color = "#F2AE2E", size = 1) +
# geom_point(data = hourly_trips %>% filter(final_day_type == "Weekday"), aes(x = hour, y = number_of_trips), color = "#26A699", size = 3) +
# # Add the line plot for weekend trips with a new color for better differentiation
# geom_line(data = hourly_trips %>% filter(final_day_type == "Weekend"), aes(x = hour, y = number_of_trips), color = "#0072B2", size = 1) +
# geom_point(data = hourly_trips %>% filter(final_day_type == "Weekend"), aes(x = hour, y = number_of_trips), color = "#D55E00", size = 3) +
# # Define scales and labels with appropriate adjustments
# scale_x_continuous(breaks = 0:23) +
# scale_y_continuous(labels = scales::comma) +
# scale_fill_manual(values = service_period_colors) +
# scale_color_manual(values = c("Weekday" = "#F2AE2E", "Weekend" = "#0072B2")) +
# labs(title = "Number of Trips by Hour with Service Periods",
# x = "Hour of Day",
# y = "Number of Trips",
# fill = "Service Period",
# color = "Day Type") +
# theme_minimal() +
# theme(legend.position = "bottom",
# plot.title = element_text(hjust = 0.5))
# CREATE LINE GRAPH OF TRIPS BY SERVICE PERIODS
# Function to normalize times greater than 24:00:00. Important step: check the stop_times$arrival_time and departure_time values if it's greater than 24:00
normalize_time <- function(time_str) {
time_parts <- strsplit(time_str, ":")[[1]]
hours <- as.numeric(time_parts[1])
minutes <- as.numeric(time_parts[2])
seconds <- as.numeric(time_parts[3])
if (hours >= 24) {
hours <- hours - 24
time_str <- sprintf("%02d:%02d:%02d", hours, minutes, seconds)
}
return(time_str)
}
#identify peak_pm category
dat <- dat %>%
mutate(departure_time = sapply(departure_time, normalize_time), # departure_time's normalized
departure_time = lubridate::hms(departure_time),
hour = hour(departure_time), # Extract hour component for easier condition checking
peak_category = if_else(hour >= 4 & hour < 6, "Early Morning",
if_else(hour >= 6 & hour < 9, "AM Peak",
if_else(hour >= 9 & hour < 15, "Mid Day",
if_else(hour >= 15 & hour < 18, "PM Peak",
if_else(hour >= 18 & hour < 22, "Evening",
ifelse(hour >= 22 & hour < 24, "Late Night", "Owl")))))))
# # Check if there are any NA values in the 'departure_time' column
# summarise(dat,
# na_count = sum(is.na(departure_time)), #Count NA values
# total_rows = n(), # Total number of rows for context
# percentage_na = na_count / total_rows * 100 # Percentage of NA values
# )
durations <- c("Owl" = 4, "Early Morning" = 2, "AM Peak" = 3, "Mid Day" = 6, "PM Peak" = 3, "Evening" = 4, "Late Night" = 2)
dat_service_period <- dat %>%
filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
group_by(peak_category)%>%
summarise(peak_count = n(), .groups = 'drop') %>%
mutate(total_count = sum(peak_count),
peak_share = round(peak_count/sum(peak_count)*100, digits = 2)) %>%
arrange(factor(peak_category, levels = c("Owl", "Early Morning", "AM Peak", "Mid Day", "PM Peak", "Evening", "Late Night"))) %>%
ungroup()
# Create a data frame for service periods
service_periods <- data.frame(
peak_category = c("Owl", "Early Morning", "AM Peak", "Mid Day", "PM Peak", "Evening", "Late Night"),
start_hour = c(0, 4, 6, 9, 15, 18, 22),
end_hour = c(4, 6, 9, 15, 18, 22, 24)
)
# CREATE BAR CHART OF TRIPS BY SERVICE TYPES
# Ensure 'calendar' has 'day_type' defined
calendar <- calendar %>%
mutate(
day_type = case_when(
monday == 1 & tuesday == 1 & wednesday == 1 & thursday == 1 & friday == 1 & saturday == 0 & sunday == 0 ~ "Weekday",
(saturday == 1 | sunday == 1) ~ "Weekend",
TRUE ~ "irregular"
)
)
# Calendar_dates includes 'service_id', 'date', and 'exception_type'
# Ensure 'calendar_dates' has 'day_type' defined based on the date
calendar_dates <- calendar_dates %>%
filter(exception_type == 1) %>% #we only need exception_type == 1, those added to the service
mutate(
date = ymd(date), # Convert date format if necessary
day_of_week = wday(date, label = TRUE, week_start = 1),
day_type = case_when(
day_of_week %in% c("Mon", "Tue", "Wed", "Thu", "Fri") ~ "Weekday", # Monday to Friday
TRUE ~ "Weekend" # Saturday and Sunday
)
)
# Join calendar_dates with calendar
calendar_joined <- calendar_dates %>%
full_join(calendar, by = "service_id", suffix = c(".dates", ".cal")) %>%
mutate(
final_day_type = case_when(
# Use calendar_dates' day_type if exception_type indicates added service
exception_type == 1 & day_type.dates == "Weekday" ~ "Weekday",
exception_type == 1 & day_type.dates == "Weekend" ~ "Weekend",
# Default to calendar's day_type where there's no specific exception noted
TRUE ~ day_type.cal
)
) %>%
select(service_id, final_day_type) %>%
group_by(service_id, final_day_type) %>%
summarise(count = n(), .groups = 'drop') %>%
arrange(service_id, desc(count)) %>%
distinct(service_id, .keep_all = TRUE) %>% # Make sure no double day types for service ID
select(-count)
dat <- dat %>%
left_join(calendar_joined, by = "service_id")
# SET UP STOPS CATEGORY. This step is crucial as we'll use stops information and join them for each kind of maps
# Convert stops data to sf object
stops <- stops %>%
st_as_sf(coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)
# Load PA and SEPTA region map for base map
PA <- st_read("data/PaCounty2024_05.geojson")
SEPTA_region_map <- PA[PA$COUNTY_NAM %in% c("MONTGOMERY", "BUCKS", "PHILADELPHIA", "DELAWARE", "CHESTER"), ] %>%
select(PA_CTY_COD, COUNTY_NAM, FIPS_COUNT)
# Load Center City boundary
center_city <- st_read("data/20240708_CenterCity/CenterCity.shp")
# Ensure all datasets are in the same CRS
SEPTA_region_map <- st_transform(SEPTA_region_map, crs = st_crs(stops))
center_city <- st_transform(center_city, crs = st_crs(stops))
# Perform the spatial intersection for county
stops <- stops %>%
st_join(SEPTA_region_map, left = TRUE, join = st_intersects) %>%
rename(county = COUNTY_NAM) %>%
mutate(center_city = st_within(stops, center_city, sparse = FALSE))%>%
mutate(center_city = rowSums(center_city) > 0) # Convert logical matrix to vector
# Categorize the stops
stops <- stops %>%
mutate(category = case_when(
county == "PHILADELPHIA" & center_city == TRUE ~ "PHILADELPHIA-Center_City",
county == "PHILADELPHIA" & center_city == FALSE ~ "PHILADELPHIA-Not_Center_City",
TRUE ~ county
)) %>%
select(-"center_city")
# SET UP HOURLY AVERAGE TRIP DATA FOR EACH BUS STOP
dat_trip_count_hr <- dat %>%
filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
group_by(stop_id, mode, final_day_type) %>%
summarise(trip_count = n(),
All_Routes = paste(unique(route_id), collapse = ", "),
.groups = 'drop') %>%
mutate(avg_trip_hr = round(trip_count/24)) %>%
mutate(log_avg_trip_hr = round(log1p(avg_trip_hr), digits = 0)) %>%
mutate(max_freq = ifelse(avg_trip_hr >= 12, "5 MAX",
ifelse (avg_trip_hr >= 6, "10 MAX",
ifelse (avg_trip_hr >= 4, "15 MAX",
ifelse(avg_trip_hr >= 2, "30 MAX", "60 MAX")))))
dat_trip_count_hr <- left_join(dat_trip_count_hr, stops, by = "stop_id")
dat_trip_count_hr <- st_as_sf(dat_trip_count_hr, coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)
# Assuming 'county' is the column with the county information
dat_trip_count_hr <- dat_trip_count_hr %>%
mutate(category = if_else(county == "PHILADELPHIA", "urban", "suburban"))
# Simplify the max frequency into two categories
dat_trip_count_hr <- dat_trip_count_hr %>%
mutate(simplified_freq = case_when(
max_freq %in% c("5 MAX", "10 MAX", "15 MAX") ~ "15 minutes or less",
TRUE ~ "More than 15 minutes"
))
# Remove rows where 'category' is NA
dat_trip_count_hr <- dat_trip_count_hr %>% filter(!is.na(category))
# sum(dat_trip_count_hr$trip_count) #1980505 <- Always make sure you retain the number of observation
The disparities in bus service efficiency between urban and suburban areas are starkly highlighted by frequency data. In suburban areas, the majority of bus stops suffer from infrequent services. This contrasts sharply with urban areas, where bus stops generally provide more frequent services. These differences underline significant inequities in transit accessibility, impacting daily commuters like Midge who rely heavily on these services.
In the following leaflet map, we illustrate these disparities. You’ll see that bus stops in suburban areas predominantly have longer waiting times, exceeding 15 minutes, whereas urban stops typically has waiting time under 15 minutes, demonstrating more frequent services. This visual representation emphasizes the urgent need for targeted improvements in suburban transit systems to bridge the gap and enhance overall service reliability and efficiency.
# SET UP AVERAGE TRIP DATA PER SERVICE PERIOD FOR EACH BUS STOP
dat_trip_count_prd <- dat %>%
filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
group_by(stop_id, peak_category, mode, final_day_type) %>%
summarise(trip_count = n(),
All_Routes = paste(unique(route_id), collapse = ", "),
.groups = 'drop') %>%
mutate(
hours = durations[peak_category],
avg_trip_hr = round(trip_count / hours)
) %>%
select(-hours) %>%
mutate(log_avg_trip_hr = round(log1p(avg_trip_hr), digits = 0)) %>%
mutate(max_freq = ifelse(avg_trip_hr >= 12, "5 MAX",
ifelse (avg_trip_hr >= 6, "10 MAX",
ifelse (avg_trip_hr >= 4, "15 MAX",
ifelse(avg_trip_hr >= 2, "30 MAX", "60 MAX")))))
dat_trip_count_prd <- left_join(dat_trip_count_prd, stops, by = "stop_id")
dat_trip_count_prd <- st_as_sf(dat_trip_count_prd, coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)
# sum(dat_trip_count_prd$trip_count) #1980505 <- Always make sure you retain the number of observation
SEPTA_region_map <- st_transform(SEPTA_region_map, st_crs(dat_trip_count_hr))
# Define color palette for simplified frequency using the bus revolution colors
pal <- colorFactor(c("#A6123A", "#655BA6"), domain = c("15 minutes or less", "More than 15 minutes"))
# Create the leaflet map
m_avg_trip_hr <- leaflet() %>%
addProviderTiles(providers$CartoDB.Positron) %>%
addPolygons(data = SEPTA_region_map, color = "black", weight = 1, fill = FALSE) %>%
addCircleMarkers(
data = dat_trip_count_hr %>% filter(final_day_type == "Weekday" & !is.na(category)),
group = ~category, # Group data based on urban or suburban
lng = ~stop_lon,
lat = ~stop_lat,
color = ~pal(simplified_freq), # Color based on simplified frequency
radius = 1,
opacity = 0.8,
popup = ~paste0("<b>Stop ID:</b> ", stop_id, "<br>",
"<b>Stop Name:</b> ", stop_name, "<br>",
"<b>Avg Buses/Hr:</b> ", avg_trip_hr, "<br>",
"<b>Max Frequency:</b> ", simplified_freq, "<br>",
"<b>All Routes:</b> ", All_Routes)
) %>%
addLayersControl(
overlayGroups = c("urban", "suburban"),
options = layersControlOptions(collapsed = FALSE)
) %>%
addControl(html = "<div style='background-color: white; padding: 5px;'><strong>Source:</strong> SEPTA GTFS Data 2024</div>", position = "bottomleft")
# Display the map
m_avg_trip_hr